Transient Fault Tolerance via Dynamic Process-Level Redundancy

نویسندگان

  • Alex Shye
  • Vijay Janapa Reddi
  • Tipp Moseley
  • Daniel A. Connors
چکیده

Transient faults are emerging as a critical concern in the reliability of microprocessors. While hardware reliability techniques are often employed for transient fault tolerance, software techniques represent a more cost-effective and flexible alternative. This paper proposes a software approach to transient fault tolerance which utilizes a run-time system to automatically apply process-level redundancy (PLR). PLR creates a set of redundant processes per application process and compares the processes during run time to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources (i.e. extra threads or cores). PLR is a software-centric approach to transient fault tolerance in which the focus is shifted from ensuring correct hardware execution, to ensuring correct software execution. The software-centric approach is able to ignore many benign faults which do not propagate to affect the program output. In addition, the dynamic deployment creates a very flexible fault tolerant system which transparently applies PLR without prior modifications to the application, shared libraries, or operating system. Experiments using a real PLR prototype on an SMP machine demonstrate that PLR can effectively provide transient fault tolerance with a slowdown of only 1.26x.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Configurable Transient Fault Detection via Dynamic Binary Translation

Smaller feature sizes, lower voltage levels, and reduced noise margins have helped improve the performance and lower the power consumption of modern microprocessors. These same advances have made processors more susceptible to transient faults that can corrupt data and make systems unavailable. Designers often compensate for transient faults by adding hardware redundancy and making circuitand p...

متن کامل

Design and Analysis of Transient Fault Tolerance for Multi Core Architecture

This paper describes the software approach of fault tolerance for shared memory multi core system using PLR.PLR uses a software-centric approach transient fault tolerance which ensuring a correct software execution. This scheme is used at user space level which does not necessitate changes to the original application.PLR create a set of redundant process per application process. In this scheme ...

متن کامل

Exploiting Instruction Redundancy for Transient Fault Tolerance

This paper presents an approach for integrating fault-tolerance t e chniques into microprocessors by utilizing instruction redundancy as well as time redundancy. Smaller and smaller transistors, higher and higher clock frequency, and lower and lower power supply voltage reduce r eliability of microprocessors. In addition, microprocessors are u s e d in systems which require h i g h d e p endabi...

متن کامل

Energy-Aware Synthesis of Fault-Tolerant Schedules for Real-Time Distributed Embedded Systems

In this paper we present an approach to the scheduling and voltage scaling of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. Addressing simultaneously energy and reliability is especially challenging because l...

متن کامل

Fault-Tolerant Dynamic Systems

Modular redundancy (system replication) and other traditional techniques for fault tolerance in dynamic systems are expensive, and rely heavily — particularly in the case of systems operating over extended time horizons — on the assumption that the error-correcting mechanism (e.g., voting) is fault-free. Herein, we construct redundant dynamic systems in a way that achieves tolerance to transien...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006